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Abstract 

Comparative genomics provides a powerful tool to characterize the genetic differences among species that may be linked to their 
phenotypic variations. In the case of mosquito-associated Spiroplasma species, such approach is useful for the investigation of their 
differentiations in substrate utilization strategies and putative virulence factors. Among the four species that have been assessed for 
pathogenicity by artificial infection experiments, Spiroplasma culicicola and S. taiwanense were found to be pathogenic, whereas 5. 
diminutum and 5. sabaudiense were not. Intriguingly, based on the species phylogeny, the association with mosquito hosts and the 
gain or loss of pathogenicity in these species appears to have evolved independently. Through comparison of their complete genome 
sequences, we identified the genes and pathways that are shared by all or specific to one of these four species. Notably, we found that 
a glycerol-3-phosphate oxidase gene (glpO) is present in 5. culicicola and 5. taiwanense but not in 5. diminutum or 5. sabaudiense. 
Because this gene is involved in the production of reactive oxygen species and has been demonstrated as a major virulence factor in 
Mycoplasma, this distribution pattern suggests that it may be linked to the observed differences in pathogenicity among these species 
as well. Moreover, through comparative analysis with other Spiroplasma, Mycoplasma, and Mesoplasma species, we found that the 
absence of glpO in 5. diminutum and 5. sabaudiense is best explained by independent losses. Finally, our phylogenetic analyses 
revealed possible recombination of glpO between distantly related lineages and local rearrangements of adjacent genes. 
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Introduction 

Comparative analysis of gene content among related species 
with distinct phenotypes has provided a powerful tool to in- 
vestigate the underlying genetic mechanisms. For example, 
examination of the presence and absence of genes between 
bacterial species that differ in pathogenicity can be used to 
identify putative virulence factors. This genome-scale screen- 
ing is a high-throughput and cost-effective approach of 
narrowing down the list of candidate genes, which may 
greatly facilitate the downstream experimental verification 



and functional characterization. To demonstrate the utility 
of this comparative approach, the mosquito-associated 
Spiroplasma species provide a good study system. 

The genus Spiroplasma contains a diverse group of wall-less 
bacteria that are mostly associated with various insect 
hosts (Whitcomb 1981; Gasparich et al. 2004; Regassa and 
Gasparich 2006; Gasparich 2010). To date five characterized 
Spiroplasma species have been found to be associated 
with mosquitoes, including Spiroplasma culicicola (Hung 
et al. 1987), S. sabaudiense (Abalain-Colloc et al. 1987), 
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5. taiwanense (Abalain-Colloc et al. 1988), 5. cantharicola 
(Whitcomb et al. 1993), and 5. diminutum (Williamson et al. 
1 996). All of these five Spiroplasma species belong to the Apis 
clade within the genus. Interestingly, examination of their se- 
rotypes, phylogenetic placements, and the host associations 
of other related species suggest that the associations with 
mosquitoes have multiple independent origins (Gasparich 
et al. 2004; Lo, Ku, et al. 2013; the phylogeny and host asso- 
ciations are summarized in fig. 1). Because of the interests in 
developing these mosquito-associated bacteria for biological 
control of insect pests, a series of artificial infection experi- 
ments have been performed to examine the pathogenicity 
of these Spiroplasma species (Chastel and Humphery-Smith 
1991; Humphery-Smith, Grulet, Chastel, et al. 1991; 
Humphery-Smith, Grulet, Le Goff, et al. 1991; Vorms-Le 
Morvan et al. 1991; Vazeille-Falcoz et al. 1994; Phillips and 
Humphery-Smith 1995). Based on these results, infection by 
5. taiwanense or 5. culicicola increased the mortality of mos- 
quitoes, no significant effect was found for the infection by 
5. diminutum or 5. sabaudiense, while the effects of 
5. cantharicola infection remained to be tested. 

To investigate the genetic mechanisms that may explain 
these observed differences in artificial infection experiments, 
we have determined the complete genome sequences of 
5. taiwanense and 5. diminutum for comparative analysis 
(Lo, Ku, et al. 2013). One main finding from this pairwise 
genome comparison is that 5. taiwanense has a copy of 
glpO encoding a glycerol-3-phosphate (G3P) oxidase, while 
5. diminutum does not. Because this gene is involved in reac- 
tive oxygen species (ROS) production, the presence of this 
gene in the 5. taiwanense genome provides an explanation 
for the observation of tissue damage (Phillips and Humphery- 
Smith 1995) and increased mortality (Humphery-Smith, 
Grulet, Chastel, et al. 1991; Humphery-Smith, Grulet, Le 
Goff, et al. 1991; Vazeille-Falcoz et al. 1994) in infected 
hosts. Moreover, functional characterizations have provided 
experimental evidence that this gene is the main virulence 
factor in the closely related Mycoplasma mycoides (Pilo et al 
2005, 2007) and the more distantly related M. pneumoniae 
(Hames et al. 2009). 

However, several questions remained regarding the molec- 
ular evolution of glpO in mosquito-associated Spiroplasma 
species. For example, was the gene gained in the lineage lead- 
ing to 5. taiwanense or lost in 5. diminutum! Do other 
Spiroplasma species possess this gene as well? To address 
these questions, we determined the complete genome se- 
quences of the other two mosquito-associated Spiroplasma 
species that have been tested in artificial infection experi- 
ments, 5. culicicola and 5. sabaudiense, for more comprehen- 
sive comparative analyses in this study. Because the two 
species that have been found to be pathogenic (i.e., 5. taiwa- 
nense and 5. culicicola) do not form a monophyletic clade 
when other mosquito-associated species are considered (Lo, 
Ku, et al. 2013), this expansion in taxon sampling provides us 
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Fig. 1. — Phylogeny of representative Spiroplasma species and the 
Mycoides-Entomoplasmataceae clade. The isolation source of the 
Spiroplasma species in the Apis clade is labeled after the species name. 
The four mosquito-associated species analyzed in this study are highlighted 
in bold. The phylogeny is based on Gasparich et al. (2004) and Lo, Ku, et al. 
(2013). 



with the opportunity to investigate the possibility of multiple 
independent gains or losses of putative virulence factors in 
these bacteria. Additionally, the recent increased availability 
of complete genome sequences from other Spiroplasma spe- 
cies (Ku et al. 2013, 2014) has improved our ability to establish 
the ancestral states of gene content and to perform molecular 
phylogenetic inference. Taken together, we aim to improve 
our understanding of the substrate utilization strategy and pu- 
tative virulence factors in mosquito-associated Spiroplasma 
species. 



Materials and Methods 

Genome Sequencing 

The two bacterial strains sequenced in this study, 5. culicicola 
AES-1 T and 5. sabaudiense Ar-1 343 T , were acquired from the 
American Type Culture Collection (ATCC catalog numbers 
351 12 and 43303, respectively). For the whole genome shot- 
gun sequencing of 5. culicicola, one paired-end (~160-bp 
insert; 151 -bp reads; -0.81 -Gb raw reads) and one mate-pair 
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(~3-kb insert; 101 -bp reads; ~1 .85-Gb raw reads) library were 
prepared and sequenced using the lllumina HiSeq 2000 plat- 
form (lllumina, USA). For 5. sabaudiense, one paired-end li- 
brary (-31 1 -bp insert; 251-bp reads; ~0.65-Gb raw reads) 
was sequenced using the lllumina MiSeq platform (lllumina). 

The procedures for genome assembly and annotation were 
based on those described in our previous studies (Chung et al. 
2013; Lo, Chen, et al. 2013). For S. culicicola, the initial de 
novo genome assembly was performed using ALLPATHS-LG 
release 42781 (Gnerre et al. 201 1) because of the availability 
of mate-pair reads. For 5. sabaudiense, the assembly was per- 
formed using VELVET v1 .2.07 (Zerbino and Birney 2008). PCR 
primer walking and Sanger sequencing were used for gap 
filling and assembly verification. After the genomes were se- 
quenced to completion, all raw reads were mapped to the 
final assembly by BWA vO.7.4 (Li and Durbin 2009) for variant 
check by SAMTOOLS vO.1.19 (Li et al. 2009) and for visual 
inspection by IGV v2.1.24 (Robinson et al. 2011). The pro- 
grams RNAmmer (Lagesen et al. 2007), tRNAscan-SE (Lowe 
and Eddy 1 997), and PRODIGAL (Hyatt et al. 201 0) were used 
for gene prediction. The gene names and product descriptions 
were annotated based on the orthologous genes identified by 
OrthoMCL (Li et al. 2003) in previously published Spiroplasma 
genomes (Ku et al. 2013; Lo, Chen, et al. 2013; Lo, Ku, et al. 
2013; Ku et al. 2014), including S. apis (GenBank ac- 
cession number CP006682), 5. chrysopicola (CP005077), 
5. diminutum (CP005076), 5. melliferum (AMGI01 000001 - 
AMGI01 000024), 5. syrphidicola (CP005078), and 
5. taiwanense (CP005074-CP005075). The genes that lack 
identifiable orthologs were manually curated based on the 
BlastP (Altschul et al. 1997; Camacho et al. 2009) searches 
against the NCBI nonredundant (nr) protein database (Benson 
et al. 2012). The signal peptides of putative secreted proteins 
were identified using SignalP v4.0 (Petersen et al. 201 1) based 
on the Gram-positive bacteria model. To distinguish between 
secreted proteins and transmembrane proteins, the predicted 
signal peptides were removed before the identification of trans- 
membrane helices using TMHMM v2.0 (Krogh et al. 2001). The 
Conserved Domain Database (Marchler-Bauer et al. 2013) was 
used to identify protein domains and to provide additional an- 
notation information. The KAAS tool (Moriya et al. 2007) pro- 
vided by the KEGG database (Kanehisa and Goto 2000; 
Kanehisa et al. 2010) was used to classify protein-coding 
genes into the COG functional categories (Tatusov et al. 
1997, 2003). To visualize the genomes of the four mosquito- 
associated Spiroplasma species, the annotated chromosomes 
were plotted using CIRCOS (Krzywinski et al. 2009) to show 
gene locations, GC-skew, and GC content. 

Comparative Analysis 

Two sets of comparative analyses were performed in this 
study. The first set includes the four mosquito-associated 
Spiroplasma species, all of which belong to the Apis clade 



within this genus. The second set expands the taxon 
sampling to include 5. apis in the Apis clade, four 
Mycoplasma/ Mesoplasma species in the sister Mycoides- 
Entomoplasmataceae clade (M. mycoides [BX293980], 
M. leachii [CP002108], M. putrefaciens [CP004357], and 
Me. florum [AE017263]) and two Spiroplasma species in the 
Chrysopicola clade as the outgroup (5. chrysopicola and 
5. syrphidicola). In these two sets of comparative analyses, 
the homologous gene clusters among the genomes being 
compared were identified by OrthoMCL (Li et al. 2003). The 
lists of homologous gene clusters were examined to investi- 
gate the patterns of gene presence and absence. 

For the four mosquito-associated Spiroplasma species, we 
utilized MUMmer v3.23 (Kurtz et al. 2004) for pairwise 
genome alignments. We increased the minimum match 
length (option "-I") to 24 from the default setting of 20 to 
reduce spurious hits. The chromosome of 5. taiwanense was 
chosen as the reference to be compared with the other three 
species. To estimate the genome-wide sequence divergence 
levels, the single-copy orthologous genes shared by these four 
species were used for sequence alignment by MUSCLE 
v3.8 (Edgar 2004). The alignments of individual genes were 
concatenated for calculation of sequence similarities by 
the DNADIST and PROTDIST programs of PHYLIP v3.69 
(Felsenstein 1989). 

For molecular phylogenetic inference, homologous genes 
from selected genomes were aligned using MUSCLE v3.8 
(Edgar 2004). The maximum likelihood phylogenies were in- 
ferred using PhyML v3.0 (Guindon and Gascuel 2003). The 
proportion of invariable sites and the gamma distribution pa- 
rameter were estimated from the data set, and the number of 
substitute rate categories was set to four. Bootstrap supports 
were estimated based on 1,000 replicates generated by the 
SEQBOOT program of PHYLIP v3.69 (Felsenstein 1989). 

Results and Discussion 

Genome Sequences of Mosquito-Associated Spiroplasma 
Species 

The genome assembly statistics and chromosomal organiza- 
tion of the four mosquito-associated Spiroplasma species are 
provided in table 1 and figure 2. Both of the two newly 
sequenced species contain a circular chromosome that is 
approximately 1.1Mb in size (5. culicicola: 1,175,131 bp; 
5. sabaudiense: 1,075,953 bp). These genome sizes are 
slightly smaller than those reported previously based on 
pulsed-field gel electrophoresis (Carle et al. 1995). The 
values for GC content are similar to those estimates based 
on the buoyant density method (Abalain-Colloc et al. 1987; 
Hung etal. 1987). 

The availability of these two additional Spiroplasma ge- 
nomes, together with the previously established species phy- 
logeny (Lo, Ku, et al. 2013), provided several insights into the 
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Table 1 

Genome Assembly Statistics 



Genome 


S. diminutum CUAS-1 7 


S. taiwanense CT-1 T 


S. culicicola AES-1 T 


S. sabaudiense Ar-1343 T 


GenBank accession 


CP005076 


CP005074 


CP006681 


CP006934 


Chromosome size (bp) 


945,296 


1,075,140 


1,175,131 


1,075,953 


GC content (%) 


25.5 


23.9 


26.4 


30.2 


Coding density (%) 


92.7 


82.5 


92.2 


90.0 


Protein-coding genes 


858 


991 


1,071 


924 


Length distribution (Q1/Q2/Q3) (aa) 


177/283/443 


137/247/397 


176/283/437 


189/296/455 


Hypothetical proteins 


310 


452 


460 


368 


Annotated pseudogenes 


0 


54 


0 


7 


rRNA genes/operons 


3/1 


3/1 


3/1 


6/2 


tRNA genes 


29 


29 


29 


30 


Plasmid (GenBank accession) 


0 


1 (CP005075) 


0 


0 



genome evolution in the Apis clade. For example, the low GC 
content, low coding density, and high frequency of pseudo- 
genes observed in the 5. taiwanense genome (Lo, Ku, et al. 
2013) appear to be derived states specific to this lineage. 
Additionally, 5. sabaudiense belongs to the basal group and 
has several distinct genomic features. First, 5. sabaudiense has 
the highest GC content (30.2%) among the Spiroplasma ge- 
nomes sequenced to date. For comparison, the other three 
mosquito-associated Spiroplasma species have a GC content 
of 23.9-26.4%. Considering that the 5. apis genome has a 
GC content of 28.3% (Ku et al. 2014) and the more distantly 
related Spiroplasma species in the Chrysopicola clade have a 
GC content of 28.8-29.2%, it is possible that a relatively high 
GC content of -28-30% represents the ancestral state of the 
Apis clade. Second, 5. sabaudiense has one additional tRNA- 
Ser gene compared with other genomes in the Apis clade (Lo, 
Ku, et al. 2013; Ku et al. 2014), which may be the result of a 
lineage-specific gene duplication event. Finally, 5. sabaudiense 
has two complete and identical rRNA operons. For compari- 
son, all other characterized Spiroplasma genomes were found 
to have only one rRNA operon (Carle et al. 201 0; Alexeev et al. 
2012; Ku et al. 2013; Lo, Chen, et al. 2013; Lo, Ku, et al. 
2013; Ku et al. 2014), while the Mycoplasma/ Mesoplasma 
species in the sister Mycoides-Entomoplasmataceae clade 
have two rRNA operons. It is unclear if this pattern is due to 
two independent duplication events (i.e., one in the lineages 
leading to 5. sabaudiense and one in the common ancestor 
of the Mycoides-Entomoplasmataceae clade) or a single 
duplication in the common ancestor of the Apis-Mycoides- 
Entomoplasmataceae clade, followed by one or more losses in 
the Apis clade. Future improvement in the taxon sampling of 
available complete genome sequences from the Apis and the 
Mycoides-Entomoplasmataceae clade is necessary to further 
investigate this issue. 

When the chromosome of 5. taiwanense is used as the 
reference for pairwise genome alignments, the patterns are 
consistent with the expectation based on the species phylog- 
eny (Lo, Ku, et al. 2013) and the levels of sequence similarity 
(fig. 3). Compared with the most closely related 5. diminutum 



(fig. 3A), the chromosomal organizations are largely con- 
served, except for the -0.5-0.7 Mb region that contains the 
putative replication terminus at -0.58 Mb and possibly in- 
volves an inversion. In contrast, the highly divergent 5. sabau- 
diense exhibits low levels of sequence similarity and synteny 
conservation (fig. 3Q. 

Comparison of Gene Content and Substrate 
Utilization Strategies 

A total of 1 ,634 homologous gene clusters were found among 
the four mosquito-associated Spiroplasma species (fig. 4 and 
supplementary table S1, Supplementary Material online). 
Among these, 552 are shared by all four species (>50% of 
all the protein-coding gene in each species). These core genes 
include those involved in essential cellular processes conserved 
among bacteria (Koonin 2003; Lapierre and Gogarten 2009; 
Chen et al. 2012), such as DNA replication, transcription, and 
translation. Furthermore, genes that have been suggested as 
shared between the Apis and the Citri clade within 
Spiroplasma (Lo, Chen, et al. 2013; Lo, Ku, et al. 2013), such 
as those involved in glucose uptake and utilization (ptsG and 
err), fructose uptake and utilization {fruA and fruK), A/-acetyl- 
glucosamine (GlcNAc) uptake and utilization (nagE, nagA, and 
nagB), glycolysis (pgi, pfkA, fbaA, gap, gapN, pgk, pgm, eno, 
and pyk; the dotted line in fig. 5), nucleotide biosynthesis (e.g., 
adk, apt, gmk, hprT, purA, purB, pyrG, pyrH, rdgB, tdk, thyA, 
tmk, upp, etc), the nonmevalonate pathway for isopentenyl 
pyrophosphate synthesis {dxr, dxs, ispD, ispE, ispF, ispG, and 
ispH), protection from oxidative stress (sufB, sufC, sufD, sufS, 
sufil, and tpx), and putative secreted proteins containing 
GH18 chitinase and SGNH hydrolase domains are found to 
be conserved in these four species. Intriguingly, among these 
conserved genes, we observed several cases of lineage-specific 
gene family expansions. For example, in a previous analysis of 
the 5. taiwanense genome (Lo, Ku, et al. 2013), three oligo- 
peptide ABC transporter genes {oppC, oppD, and oppF) were 
found to have three copies each. Because these genes are 
single copy in the other three Spiroplasma genomes compared 
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Fig. 2. — Chromosomal organization of the four mosquito-associated Spiroplasma species. Rings from the outside in: (1) scale marks (unit: Mb), (2 and 
3), protein-coding genes on the forward and reverse strand, respectively (color coded by the functional categories), (4) GC skew (positive: green; negative: 
yellow), and (5) GC content (above average: orange; below average: blue; rRNA operons are labeled by black triangles). 



in this study, this observation is best explained by 
S. taiwanense-spec\f\c tandem duplications. In addition, the 
6-phospho-beta-glucosidase gene (bgl) was found to exhibit 
a high level of copy number variation, ranging from single copy 



in 5. taiwanense, three copies in 5. diminutum and 5. sabau- 
diense, to eight copies in 5. culicicola. 

Most of the species-specific genes are annotated as hypo- 
thetical proteins, such that we are unable to infer their 
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Fig. 3. — Pairwise genome alignments. The genome of 5. taiwanense 
was used as the reference for all three alignments (red dots: matches on 
the same strand; blue dots: matches on the opposite strands). The se- 
quence similarity levels were calculated based on 524 single-copy ortho- 
logous genes shared by these four species. The concatenated alignments 
contain 560,694 aligned nucleotide (nt) sites and 184,080 aligned amino 
acid (aa) sites, respectively, nt/aa similarity: (A) 76.6%/70.9%, (B) 73.3%/ 
66.0%, and (O 65.2%/54.2%. 




Totei: 1 h 634 homologous gene clusters 

Fig. 4. — Distribution pattern of homologous gene clusters. The de- 
tailed lists of these gene clusters are provided in supplementary table S1, 
Supplementary Material online. 



functional significance. Among these four species, 5. diminu- 
tum has the lowest number of species-specific gene clusters 
(fig. 4), possibly due to the fact that it has the smallest chro- 
mosome and the fewest protein-coding genes. Intriguingly, 
S. sabaudiense has a large family of species-specific hypothet- 
ical proteins with 27 copies. These genes are often found as 
clusters of two to four adjacent copies on the chromosome in 
regions with unexpected GC skew patterns (e.g., -0.2, -0.4, 
and -1 .0 Mb in fig. 2; the assembly in these regions have been 
verified by PCR), suggesting that these DNA have been inte- 
grated recently. However, because database searches pro- 
vided no identifiable homolog or conserved protein domain, 
the function and the origin of these hypothetical proteins 
remained unknown. 

In the few cases that the functional roles of species-specific 
genes can be inferred, they reveal interesting information 
about the metabolism differences among these species. For 
example, 5. sabaudiense is the only species that has the com- 
plete set of genes for arginine utilization (arcA, arcB, and 
arcQ, which is consistent with previous biochemical tests 
(Abalain-Colloc et al. 1987; Hung et al. 1987; Abalain- 
Colloc et al. 1988; Williamson et al. 1996). Additionally, 
sucrose utilization (scrB and scrK) appears to be limited to 
5. diminutum, while glycerophosphocholine (GPC; substrate 
of glpU and glpQ) utilization appears to be limited to 5. culi- 
cicola. Intriguingly, one putative secreted protein specific to 
S. culicicola (SCULI_v1c06250) was found to contain a partial 
Pfam03318 domain (Clostridium epsilon toxin EYX/Bacillus 
mosquitocidal toxin MTX2), which may contribute to its path- 
ogenicity toward mosquitoes. 

Other than the genes that are shared by all four species or 
specific to one of the species, genes with more variable phy- 
logenetic distribution patterns are important in promoting our 
understanding of these mosquito-associated bacteria (fig. 5 
and supplementary table S1, Supplementary Material online). 
Two sets of genes are of particular interest because of the 
differences in pathogenicity toward mosquitoes observed in 
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previous artificial infection experiments (Chastel and 
Humphery-Smith 1991; Humphery-Smith, Grulet, Chastel, 
et al. 1991; Humphery-Smith, Grulet, Le Goff, et al. 1991; 
Vorms-Le Morvan et al. 1991; Vazeille-Falcoz et al. 1994; 
Phillips and Humphery-Smith 1995). For the two species with- 
out apparent pathogenicity (i.e., 5. diminutum and 5. sabau- 
diense), they were found to share murP and murQ for the 
uptake and utilization of A/-acetylmuramic acid (MurNAc) 
and nrdD for the conversion of CTP to dCTP (Fontecave 
et al. 1989). For the two species that exhibit pathogenicity 
(i.e., 5. culicicola and 5. taiwanense), they were found to 
share a copy of g/pO for ROS production. This finding provides 
further support for our previous inference that glpO is likely a 
virulence factor in these mosquito-associated Spiroplasma 
species (Lo, Ku, et al. 2013). To provide the substrate for 
glpO, these two species both contain the genes coding for 
sn-glycerol-3-phosphate ABC transporter (ugpA, ugpC, and 
ugpE) for direct import of G3P and glycerol kinase (glpK) for 
glycerol phosphorylation. Furthermore, a pseudogenized copy 



of glycerophosphoryl diester phosphodiesterase (glpQ) was 
found in the 5. taiwanense genome (Lo, Ku, et al. 2013), 
suggesting that the metabolic capacity to utilize GPC was 
ancestral as well. 

Molecular Evolution of the Glycerol Metabolism Genes 

Based on the comparison of their substrate utilization strate- 
gies (fig. 5), glycerol metabolism and the associated produc- 
tion of ROS are likely to be linked to the observed 
pathogenicity of 5. culicicola and 5. taiwanense in artificial 
infection experiments (Chastel and Humphery-Smith 1991; 
Humphery-Smith, Grulet, Chastel et al. 1991; Humphery- 
Smith, Grulet, Le Goff, et al. 1991; Vazeille-Falcoz et al. 
1994; Phillips and Humphery-Smith 1995). Our examination 
of the chromosomal locations of these glycerol metabolism 
genes revealed that the gene order of glpF-glpO-glpK is largely 
conserved among the Spiroplasma species with complete 
genome sequences available (fig. 6); 5. culicicola represents 
the only exception due to the insertion of glpQ and glpil 
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between glpO and glpK. For comparison, the gene order is 
glpO-glpK-glpF \n the three Mycoplasma species belonging to 
the Mycoides-Entomoplasmataceae clade, which seems to be 
a derived state due to one or more rearrangements. Based on 
the phylogenetic distribution pattern of this gene cluster, its 
absence in 5. diminutum, S. sabaudiense, and Me. florum is 
best explained by independent losses. 

For more detailed investigation of the molecular evolution 
of these genes, we compared the individual gene trees to the 
species phylogeny (fig. 7). Surprisingly, despite the conserva- 
tion in gene order among the Spiroplasma species, all three 
gene trees support the clustering of 5. taiwanense homologs 
with those from the Mycoides-Entomoplasmataceae clade. 
This unexpected conflict between gene order and gene phy- 
logenies is difficult to explain. Future investigation that incor- 
porates additional sequence data from more diverse lineages 
in the Apis and the Mycoides-Entomoplasmataceae clade is 
essential to confirm the gene phylogenies. 

In the examination of gene order and gene phylogenies, 
we found several interesting points regarding the glycerol 
uptake facilitator protein gene (glpF). In addition to the copy 
adjacent to glpO and glpK, several Spiroplasma genomes 
contain a second copy of glpF in other regions of the chromo- 
somes (fig. 6). In all cases, these isolated copies exhibit high 
levels of sequence divergence from other homologs (fig. IB; 
the locus tags are assigned sequentially starting from dnaA 
and reflect their relative positions on the chromosome). The 
second copy from 5. apis (SAPIS_v1c05780; located at 
-0.69 Mb in fig. 6) was not included in the gene phylogeny 
because it was not grouped in the same homologous gene 
cluster with all other copies of glpF. This pattern of sequence 
divergence may be explained by the release from selective 
constraint for these redundant copies. For 5. diminutum and 
perhaps also 5. sabaudiense, the glpF may be in the process of 
nonfunctionalization because the downstream glpKhas been 
lost (fig. 5). Such gradual degradation of gene content is 
common among host-associated bacteria (Ochman and 
Davalos 2006; McCutcheon and Moran 2012) and is likely 
to be driven by a combination of mutational biases toward 
deletions and high levels of genetic drift (Mira et al. 2001 ; Kuo 
et al. 2009; Kuo and Ochman 2009; Kuo and Ochman 201 0). 
Eventually, these Spiroplasma lineages may lose their g/pFjust 
as what has occurred for Me. florum. 

Conclusions 

In summary, the gene content comparison presented in this 
study provides an overview on the substrate utilization strate- 
gies across diverse mosquito-associated Spiroplasma species. 
Moreover, our result demonstrates that glpO is conserved 
across diverse Spiroplasma lineage. The absence of glpO in 
5. diminutum and 5. sabaudiense is best explained by inde- 
pendent losses and may be linked to the lack of pathogenic- 
ity in these two species. The clustering of the 5. taiwanense 



glpF/glpO/glpK with those from the Mycoides- 
Entomoplasmataceae clade is an intriguing point that requires 
further investigation. Finally, future tool development for the 
genetic manipulation of these bacteria is necessary for the 
functional characterization of their putative virulence factors. 

Supplementary Material 

Supplementary table S1 is available at Genome Biology and 
Evolution online (http://www.gbe.oxfordjournals.org/). 
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